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Expander ^o-decoding 

Rodrigo Mendoza-Smith and Jared Tanner 


Abstract —We introduce two new algorithms, Serial-^o and 
ParalIel-£o for solving a large underdetermined linear system 
of equations y = Ax £ R"* when it is known that x £ R" has 
at most k < m nonzero entries and that A is the adjacency 
matrix of an unbalanced left d-regular expander graph. The 
matrices in this class are sparse and allow a highly efficient 
implementation. A number of algorithms have been designed to 
work exclusively under this setting, composing the branch of 
combinatorial compressed-sensing (CCS). 

Serial-^o and Parallel-^o iteratively minimise \\y — Ax\\o by 
successfully combining two desirable features of previous CCS 
algorithms: the information-preserving strategy of ER m, and 
the parallel updating mechanism of SMP O. We are able to 
link these elements and guarantee convergence in 0{dn\ogk) 
operations by assuming that the signal is dissociated, meaning 
that all of the 2^ subset sums of the support of x are pairwise 
different. However, we observe empirically that the signal need 
not be exactly dissociated in practice. Moreover, we observe 
Serial-^o and Parallel-^o to be able to solve large scale problems 
with a larger fraction of nonzeros than other algorithms when 
the number of measurements is substantially less than the signal 
length; in particular, they are able to reliably solve for a fc-sparse 
vector X £ R" from m expander measurements with n/m — 10^ 
and k/m up to four times greater than what is achievable by £i- 
regularization from dense Gaussian measurements. Additionally, 
due to their low computational complexity, Serial-^o and Parallel- 
£o are observed to be able to solve large problems sizes in 
substantially less time than other algorithms for compressed 
sensing. In particular, Parallel-^o is structured to take advantage 
of massively parallel architectures. 


I. Introduction 

Compressed sensing El 01 s Ha I considers the problem 
of sampling and efficiently reconstructing a compressible bnite 
dimensional signal x £ K" from far fewer measurements 
than what Nyquist and Shannon deemed possible Emni. In 
its simplest form compressed sensing states that if x £ K" 
has at most k < n nonzero entries, then it can be sampled 
from m linear measurements y = Ax £ K™ and that x 
can be recovered from {y, A) with computationally efficient 
algorithms provided m < n is sufficiently large, see im. 

The most widely studied sensing matrices A are from 
the classes of; a) Gaussian or uniformly drawn projections 
which are most amenable to precise analysis due to their 
spherical symmetry, and b) partial Fourier matrices which 
have important applications for tomography and have fast 
transforms allowing A and A* to be applied in O(nlogn) 
operations. Unfortunately the partial Fourier matrices are not 
known to allow the asymptotically optimal order number of 
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measurements of m ^ k ^ n, rather the best analysis ensures 
recovery for m ^ k log® n ifTTl . Sparse binary matrices with a 
fixed number of non-zeros per column offer the possibility of 
A and A* being applied in 0{n) time and for asymptotically 
optimal order number of measurements m ^ k ^ n inaini. 
When restricting to these matrices, compressed sensing is 
referred to as combinatorial compressed sensing, m. 

A. Combinatorial compressed sensing 

The problem of sparse recovery with compressed sensing 
resembles the problem of linear sketching in theoretical com¬ 
puter science. This area considers sketching high dimensional 
vectors x £ K" using a sparse matrix A £ with the aim 

that Ax has lower dimensionality than x, but still preserves 
some of its properties with high probability. In an attempt 
to reconcile this area with the compressed-sensing paradigm, 
m proposed sensing x £ Xk using an expander matrix, 
i.e. the adjacency matrix an unbalanced bipartite graph with 
high connectivity propertied We denote the m x n matrices 
in this class by but abbreviate to Efc^^^d when the 

size is understood by its context. Expander matrices E™^^ 
are sparse binary matrices with d m ones per column, 
but with their nonzeros distributed in such a way that any 
submatrix composed of k columns has at least (1 — e)kd 
rows which are nonzero 0 This structure makes them suitable 
for sparse recovery, and also makes them low complexity in 
terms of storage, generation, and computation (see Table |^. 
Additionally, some applications like the single-pixel camera 
G1 consider measurement devices with binary sensors that 
inherently correspond to binary and sparse inner products, and 
that unfortunately, fall outside the set of matrices for which 
the widely used restricted isometry techniques apply. 

The authors of na showed that, although being sparse, 
expander matrices can sense elements in Xk ut the optimal 
measurement rate 0{k \og{k/m)), and that these can be recov¬ 
ered accurately and efficiently via -regularization. Following 
this result, a series of algorithms designed specihcally to work 
with expander matrices was presented in millinillll. The 
analysis of these algorithms requires the use of techniques and 
ideas borrowed from combinatorics, which is why this branch 
was labeled by as combinatorial compressed sensing 
(CCS). It is in this realm that we make our main contributions. 


B. Main contributions 

Our work is in the nexus of a series of papers ID mini 061 
proposing iterative greedy algorithms for combinatorial com- 


II-B 


for details. 


^See Section ! 

^ Such expander matrices can be generated by drawing i.i.d. columns with 
the location of their nonzeros drawn uniformly from the (^) support sets of 
cardinality d, mi 
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Storage 

Generation 

A*y 

m 

Gaussian/Bernoulli 

0{mn) 

0(mn) 

0{mn) 

0(k log(n/fe)) 

Partial Fourier 

0(m) 

0(m) 

0{n log n) 

C)(fclog5(n)) 

Expander 

0{dn) 

0{dn) 

0(dn) 

0(k log(n/fe)) 


TABLE I: Complexity of measurement operators. 


pressed sensing. The algorithms put forward in the aforemen¬ 
tioned sequence of papers recover the sparsest solution of a 
large underdetermined linear system of equations y = Ax by 
iteratively refining an estimation x using information about the 
residual r = y — Ax. Though these algorithms have the same 
high-level perspectiv^ their particulars are optimised to best 
tradeoff speed, robustness, and recovery region; see Table |I^ 
for a summary. For instance, at each iteration, SMP m updates 
several entries of x in parallel, allowing it to provably recover 
an arbitrary x & x'k ™ (log II2^111) iterations of complexity 
0{dn -f nlogn). However, SMP is only able to recover the 
sparsest solution when the fraction of nonzeros in the signal 
is substantially less than other compressed sensing algorithms. 
On the other hand, at each iteration, LDDSR ifT^ and ER m 
update a single entry of x in such a way that a contraction 
of \\y — Aijlo is guaranteed. This reduction in the residual’s 
sparsity is achieved by exploiting an important property of 
expander graphs, which we call the information-preserving 
property (see Theorem |11.4[ ). Essentially, this property guar¬ 
antees that most of the entries from x will appear repeatedly 
as entries in y = Ax. In other words, it guarantees that for 
most i e [m], we will have yi C {xj : j G supp(x)}. 
In Ha and m, this property is used to give sufficient 
conditions for decrease of \\y — Ax\\o under the regime of 
single updating of x. However, this regime of single updating 
in LDDSR and ER typically requires greater computational 
time than existing compressed-sensing algorithms. Our main 
contribution is in the design and analysis of an algorithmic 
model that successfully combines the information-preserving 
strategy of LDDSR and ER with the parallel updating scheme 
of SMR This synthesis is made possible by assuming that the 
signal of interest is dissociated. 

Definition I.l (Dissociated signals). A signal x G K” is 
dissociated if 

V ri,T 2 C supp(a;) s.t. Ti ^ T 2 . (1) 

26T1 JGT2 

The name dissociated comes from the field of additive 
combinatorics (See Definition 4.32 in ini), where a set S 
is called dissociated if the set of all sums of distinct elements 
of S has maximal cardinality. Even though the model o 
might seem restrictive, it need not be exactly fulfilled for 
our algorithm to work. In fact, it is fulfilled almost surely 
for isotropic signals, and more generally by any signal whose 
nonzeros can be modelled as being drawn from a continuous 
distribution. Moreover, it is discussed in Section |IV-C3| that 

^See Section [5] and Table [n| 


non-dissociated signals, such as integer or binary signals, 
can be recovered if instead the columns of A are scaled 
by dissociated values, and the nonzeros of x are drawn 
independently of A. Also, numerical experiments show that 
the algorithm recovery ability decreases gracefully as the 
dissociated property is lost by having a fraction of the nonzeros 
in X be equal, see Figure [TO] . 

With this assumption, our contributions are a form of model- 
based compressed sensing ifTSll in which apart from assuming 
X G Xk’ assumes special dependencies between 

the values of its nonzeros with the goal to improve the 
algorithms speed or recovery ability. Our contributions are 
Serial-fo and Parallel-fo, Algorithms and [^respectively, and 
their convergence guarantees summarised in Theorem |I.2| 


Algorithm 1; Serial-fo 

Data; A G y G M™; a G (1, d] 

Result: x G M" s.t. y = Ax 

d ^ 0, r ^ y; 

while not converged do 

for j G [n] do 

T G {wj G K : ||r||o - ||r - Wj-a^ Ho > a}; 
for (jjj G T do 

I Xj ^ Xj -f ujj ; 

end 

r y — Ax\ 

end 

end 


Algorithm 2; Parallel-fo 

Data; A G y G M’"; a G (1, d] 

Result: x G K" s.t. y = Ax 

d ^ 0, r ■<— y; 

while not converged do 

T ^ G [n] X K : ||r||o - \\r-ujjaj\\o > a}; 

for (j,u!j) G T do 

I Xj i — Xj-\- CJj\ 

end 

r y — Ax\ 

end 


Theorem 1.2 (Convergence of Expander fp-Decoders). Let 
^eE->=;and e < 1/4. and a; G Xfc be a dissociated signal. 
Then, Serial-fp and Parallel-fp with a = (1 —2e)d can recover 
X from y = Ax G in 0{dn\ogk) operations. 

The focus of this paper is on charting the development 
of Serial-fg and Parallel-£o and on proving Theorem |I.2| In 
doing so, we contrast Serial-fp and Parallel-fp to the state- 
of-the-art algorithms for compressed-sensing and show that 
when the signal is dissociated, these are the fastest algorithms 
available when implemented, respectively, in a serial or a 
parallel architecture. We support these claims with a series 
of numerical experiments that additionally show that any 
loss in universality due to our signal model is traded off by 



















MENDOZA-SMITH AND TANNER: EXPANDER (.q-DECODING 


3 


unusually high recovery regions when 5 := m/n is small and 
substantially higher than those of previous CCS algorithms. 


C. Outline 

Section |I^ gives the main background theory in expander 


graphs necessary for our discussion. Then, Section III reviews 
past advances in CCS, putting emphasis on deconstructing 
these into their essential ideas, and on pointing out common 
elements between them. Section EYI contains our main contri¬ 
butions: Serial-£o and Parallel-^o- We prove Theorem |I.2| and 
point out some technical details regarding the implementation 
of Serial-^o and Parallel-fp- We also discuss some connections 
of the dissociated model Q with Information Theory. Finally, 
in Section [V] we evaluate the empirical performance of these 
algorithms with a series of numerical experiments. 


II. Background 

In this section, we present the basic notions of graph theory 
that are necessary for understanding our subsequent analyses, 
as well as the relevant previous work in combinatorial com¬ 
pressed sensing. We start by defining some notation. 


A. Notation 

For a subset 5" C fl, we let [S'! be its cardinality, and n\S 
denote its complement. We adopt notation from combinatorics 
and use the shorthand [n] := {1, ..., n} for n G N. We also 
define = {S C [n] : [S’] = k} and = {S C 

[n] : 15”I < fc}. As mentioned in the previous section, for 
X G K", we let supp(a:) = {i : Xi ^ 0} be its support, and 
argsupp(a:) = {xi : i G supp(a:)} be the set of nonzero values 
in X. With this, we define ||x||o = |supp(x)|, and = {x G 
K" : ||x||o < k}-, vectors in Xk fc-sparse. We 

let Hk : K" —?► Xk thresholding operator that sets 

to zero all but the k largest elements in x. Throughout this 
work, we implicitly assume that x G K”, y G ffi™, and that 
A G is a binary sparse matrix with d ones per column. 

It is also implicitly assumed that m < n and that ||x||o < m. 
For a given signal x, we will use k to refer to its sparsity, 
unless we specify otherwise. 


B. Expander graphs 

A bipartite graph is a 3-tuple G = {U, V, E) such that 
UfW = 0 and E C U xV. Elements in UUV are called nodes, 
while tuples in E are called edges. Under the assumption that 
|f7| = n and |U| = m, we abuse notation and let U = [n] be 
the set of left-nodes, and V = [m] be the set of right-nodes. 
A bipartite graph is said to be left d-regular if the number of 
edges emanating from each left node is identically d, and is 
said to be unbalanced if m < n. For S C U U V we define 
JV{S) C U UV to be the neighbourhood of S, i.e. the set of 
nodes in C/ U U that are connected to S through an element 
of E. We note that for bipartite graphs, A/’(S') C V only if 
S C U, and M{S) C U only if S' C U. An expander graph 
(Figure is an unbalanced, left d-regular, bipartite graph that 
is well-connected in the sense of the following definition. 


Definition II.l (Expander graph). An unbalanced, left d- 
regular, bipartite graph G — {[n],[m],E) is a {k,e,d)- 
expander if 

\Af{S)\ > (1 - e)d|S| V S G (2) 

We call e G (0,1) the expansion parameter of the graph. 


m] 



Eig. 1: Schematic of an expander graph with d = 3. Every 
left d-regular bipartite graph is an expander for some k and e. 


Hence, the expander graphs that we consider can be thought 
of as tuples G = ([n], [m], E) such that all subsets S G 
have at most ed|S| fewer neighbours than the number of 
edges emanating from S. It will be convenient to think of 
an expander in linear algebra terms, which can be done via its 
adjacency matrix. 


Definition II.2 (Expander matrix The adjacency 

matrix of an unbalanced, left d-regular, bipartite graph G = 
{[n],[m], E) is the binary sparse matrix A G defined 

as 


Aij — 


1 i G Af{j) C [m] 

0 otherwise. 


(3) 


We let Efc^e d be a the set of adjacency matrices of {k,£,d)- 
expander graphs. 

We note that A G E™^^ is a sparse binary matrix with 
exactly d ones per column, and also that any left d-regular 
bipartite graph will satisfy for some k and e. As mentioned 
previously, ini showed that these matrices possess a bounded 
restricted isometry constant (RIC) in the ii norm in the linear 
growth asymptotic where /c ^ to ~ n —>■ oo; making these 
matrices computationally highly attractive for compressed 
sensing. The existence of expander graphs with optimal mea¬ 
surement rate of to = O{klog{n/k)), is addressed in the 
following theorem. 

Theorem II.3 (Existence of optimal expanders lfT9l I^OlH . Eor 
any n/2 > k > 1 and e > 0, there is a (fc, e, d)-expander with 

d = 0{log{n/k)/£), and to = C>(/clog(n/fc)/e^). (4) 


Theorem |II.3| also implies that in the linear growth asymp¬ 
totic offc^TO^n—>^oo and for a fixed e > 0, it holds that 
d = 0(1); that is, the number of nonzeros per column does not 
increase with the problem size. Apart from this fact, expander 
matrices are of interest in compressed sensing because they 
are nearly information preserving, meaning that for x G Xfe 
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least (1 — 2e)kd entries of y = Ax equal a nonzero value of 


X. This property is guaranteed by Lemma II.4 


Lemma II.4 (Information-preserving property). Let G = 
([n], [m],E) be an unbalanced, left d-regular, bipartite graph, 
and S G Debne, 

Afi{S) = {iGAf{S):\J\f{i)r\S\ = l}, (5) 


and 

Ar>iiS)=JV{S)\J\fi{S). (6) 

Then, G is a (fc, s, d)-expander graph if and only if 

\Afi{S)\ > (1 - 2e)d\S\ V S' e (7) 

Proof: See Appendix [A| ■ 

The information-preserving property is widely used in the 
analysis of CCS, and is a central piece in the analysis of our 
algorithms as it implies the lower ^i-RIC bound US. Finally, 
we remark that adjacency matrices of expander graphs are 
not only useful for compressed-sensing, but also for a number 
of applications including linear sketching, data-stream com¬ 
puting, graph sketching, combinatorial group testing, network 
routing, error-correcting codes, fault-tolerance, and distributed 
storage lElllol. 


III. Overview of CCS prior art 

Iterative greedy algorithms for compressed sensing seek the 
sparsest solution to a large underdetermined linear system 
of equations y = Ax and typically do so by operating on 
the residual r = y — Ax, where x is an estimate of the 
sparsest solution. Algorithms for combinatorial compressed 
sensing differ by considering updating the entry of the 
approximation, xj, based on a non inner product score sj G ffi 
dependent on r_\f(^jy, that is, on the residual restricted to the 
support set of the column of A. In order to standardise the 
convergence rate guarantees of previous CCS, we dehne the 
notion of an iteration as follows. 

Definition III.l (Iteration). Let A G x G M", and y = 

Ax. For an iterative greedy algorithm updating an estimation 
X G M" of X from a residual r = y—Ax, an iteration is defined 
as the sequence of steps performed between two updates of r. 

In the remainder of this section we deconstruct past CCS 
algorithms into their essential components so as to give a high- 
level overview of their shared characteristics. 


A. Sparse Matching Pursuit (SMP) 

SMP was proposed in Q to decode x from y = Ax with 
a voting-like mechanism in the spirit of the count-median 
algorithm from data-stream computing (see EH for details). 
SMP can also be viewed as an expander adaptation of the 
Iterative Hard Thresholding algorithm (IHT) EDj which uses 
the line-search x G- Hk[x-\-p] to minimise ||?/—AxlH over 
indeed it was rediscovered from this perspective in ifTTl lpp. 
452] where it is referred to as EIHT. Due to the structure of 
expander matrices, SMP chooses the direction p = A4{y—Ax) 
with M : K™ —K” defined as 

[Af(r)]j = median(rAA(j)). (8) 


After thresholding, this choice yields the iteration, 

X ^ Hk[x-\- H 2 k [M {y - Ax)]]. (9) 

SMP and its theoretical guarantees are stated in Algorithmic 
and Theorem IIII.2I 


Algorithm 3: SMP El 
Data; A G y G 

Result; x s K” s.t. ||x — i||i = 0{]]y — Ax]]i/d) 

X 4— 0, r 4— y; 

while not converged do 

i 4- iTfe [x -f H2k[M{r)]\, 
r <— y — Ax; 
end 


Theorem III.2 (SMP El)- Let A G and lety = Ax-hy 
for X G Xk- Then, there exists an e <C 1 such that SMP 
recovers x G M” such that ||x — x||i = G(||77||i/d). The 
algorithm terminates in G(log((i||x||i/||?7||i)) iterations with 
complexity 0{nd -\- nlogn). 


B. Sequential Sparse Matching Pursuit (SSMP) 

It was observed in ifTSll that SMP typically failed to converge 
to the sought sparsest solution when the problem parameters 
fall outside the region of theoretical guarantees. Though SMP 
updates each entry in x to individually reduce the £i norm 
of the residual, by updating multiple values of x in parallel 
causes SMP to diverge even for moderately small ratios of 
k/m. To overcome these limitations, the authors proposed 
SSMP, which updates x sequentially rather than in parallel. 
That is, at each iteration, SSMP will look for a single node 
j G [n] and an update w S M that minimise ||r — ujaj]\i, 
which can be found by computing argmax^g^^^ A^(r), see 
the discussion in Section III-E2 This approach results in a 
strict decrease in ||r||i, but the sequential update results in 
an overall increase in computational complexity, see Table 
SSMP and its theoretical guarantees are stated in AlgorithrnjC 
and Theorem IIII.3I 


Algorithm 4: SSMP ifBl 

Data; A G y G c> 1; 

Result; x S K" s.t. ||x — x||i = 0{]\y — Ax]\i/d) 

X 4— 0, r 4— y; 
while not converged do 

Eind (j, w) G [n] X M s.t. ||r — wajj|i is minimized; 
Xj G- Xj -f w; 

Perform x G- Hk]x] every (c — l)k iterations; 
r y — Ax; 


Theorem III.3 (SSMP ifTsl l. Let A G ^{cGi)k,e,d and let 
y = Ax -f rj for x G Xk- Then, there exists an e <C 1 such 
that SSMP with fixed c > 1 recovers x G M" such that ||x — 
x||i = G(||p||i). The algorithm terminates in 0{k) iterations 
of complexity O +n-\- logn) log llxlli^ 
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C. Left Degree Dependent Signal Recovery (LDDSR) 

LDDSR was proposed in lfT6l and decodes by exploiting 
the information preserving property given in Lemma |II.4| 
The main insight is that one can lower bound the number 
of elements in {i G [m] : yi G argsupp(a;)}, and use the 
structure of A to find a j G [n] and a nonzero value w G M 
that appears more than d/2 times in G It is shown 

in ifT^ that updating xj ^ xj + uj guarantees a decrease in 
||r||o when e = 1/4. LDDSR and its theoretical guarantees 
are stated in Algorithm and Theorem IIL4 


Algorithm 5: LDDSR Ub) 

Data; A G y 

G K™ 

Result; i G M" s.t. 

y = Ax 

X 

^ 0, r y'. 


while not converged 

do 


Find (j, w) G [n] 

X K \ {0} s.t. 


|{i G A/'(j) ; r, = 

= a;}|>i; 


Xj G- Xj + ui; 



r y — Ax; 


end 



circumvented by selecting the node to update by 

argmax |{i G Af{j) : Vi = mode(rAro))}| • (10) 

J6N 

E. Discussion 

Having introduced these algorithms, we now point out some 
important commonalities between them. 

1) Iterative greedy algorithms: The CCS algorithms we 
have presented share the structure of Algorithm]^ 


Algorithm 7: Iterative greedy CCS algorithms 

Data; A G y G K™ 

Result; i G K" s.t. y = Ax 

X •<— 0, r •<— y; 

while not converged do 

Compute a score Sj and an update Uj V / G [n]; 
Select S C [n] based on a rule on 
Xj G- Xj + Uj for j G S', 
fc-threshold i; 
r ^ y — Ax', 
end 


Theorem III.4 (LDDSR QD). Let A G with e = 1/4 

and X G Xk- Given y = Ax, LDDSR recovers x in at most 
0{dk) iterations with complexity 0{^ + n). 


D. Expander Recovery (ER) 

ER m differs from LDDSR by considering e < 1/4 and 
suitably adapting the set of indices from which an entry in 
X may be updated. This modification allows the number of 
iterations guaranteed to be improved, see Theorem IIL4 In 
particular, ER gurantees convergence in 0{k) iterations of 
complexity 0{nd). ER and its theoretical guarantees are stated 
in Algorithm and Theorem |IIL5| 


Algorithm 6: ER Ill 

Data; Age™/';; y 

G K™ 

Result; x G K” s.t. 

II 

H> 

X 

^ 0, r ^ y; 


while not converged 

do 


Find (j, w) G [n] 

X M \ {0} s.t. 


|{i G M{j) '. r, = 

<^} > (1 ~ 2e)d.; 


Xj G- Xj + Lo; 



r y — Ax; 


end 



Theorem III.5 (ER H]). Let A G '^ 2 k,s,d with £ < 1/4 and 
m = 0{k log(n/fc)). Then, for any x G Xk’ given y = Ax, ER 
recovers x in at most 0{k) iterations of complexity _|_ 

n). 

Though ER seemingly requires knowledge of e to imple¬ 
ment, which is NP-hard to compute, knowledge of £ can be 


The dominant computational cost in CCS greedy algorithms 
is concentrated in computing Sj and Uj, and in selecting the 
set S of nodes that will be updated. At each step of these 
algorithms, a subset S C [n] is selected. In SMP, we have 
S = \n] which makes it of sublinear complexity in ||a;||i, but 
typically diverges for even moderate values of p '.= k/m. All 
other algorithms update a single entry of x per iteration; that 
is, they choose S C [n] with IS"! = 1. This brings benefits in 
terms of convergence and recovery region, but compromises 
the computational complexity of the algorithms. A summary 
of these properties is given in Table 

2) Median minimises ||r||i.' The operation median(r_A/(j)) 
can be recast as the problem of finding the scalar w G K that 
minimises ||r — uiajWi. To see this, note that the function 

\\r-ujaj\\i = E I Ti — w I + constant (11) 

is at a minimum when |{i G ff{j) ; rj — w > 0}| = |{i G 
■^{j) : t'i — w < 0}|. Then, by definition of the median, 

argmin ||r — wajlli = median(rj\/(j)) (12) 


This is independent of the expansion parameter e. 

3) Mode does not minimise ||r||o.' In Giiia, it is shown 
that Algorithms 1^ and 1^ use Lemma 11.4 to find a pair (j, w) 
such that 


\\y - A{x + ujej)\\o < \\y - Ax\\o - (1 - 4£)d. (13) 

However, when (y — does not contain any zeros, we 

can guarantee that 

\\y - A{x + we^Ollo < Wv - ^^llo - (1 - 2£)d. (14) 

Eor dissociated signals, where X)jGsupp(a:) 

always ensure that the greater contraction rate will be achieved. 
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4) Updating Sj and uj: Algorithmsneed to compute 
a score sj = Sj{r_\f(^j')) for each j G [n], which can be done 
at cost 0{dn). It is important to note that they do not need to 
recompute all the scores at each iteration. A common strategy 
is to compute each of the scores once and store them with 
their corresponding node j G [n\ in some data structure (like 
priority queues ca or red-black trees 12). Then, at each 
iteration, we can efficiently request the node j G [n] that 
maximises the score (median, mode, etc.) and use it to update 
Xj. This update will affect d = |A/'(j)| entries of the residual, 
so we only need to recompute the scores corresponding to 
I U*gAAO)-^(*)I = 0{d^nlm) right nodes. 

IV. Main CONTRIBUTIONS: Iterative £o-minimisation 

Our main contributions, Serial-fp and Parallel-£o, advance 
combinatorial compressed sensing by having comparatively 
high phase transitions while retaining the low computa¬ 
tional complexity of SMP and the parallel implementation 
of LDDSR. In particular, Parallel-^o is observed to typically 
recover the sparsest solution of underdetermined systems of 
equations in less time than any other compressed sensing 
algorithm when the signal is dissociated and the sensing matrix 
is an expander graph. 

Serial-£o and Parallel-fg look for a solution by identifying 
nodes which if updated sequentially would strictly reduce the 
||r||o by at least a. That is, they will choose a coordinate j of 
X, and an update value to such that, 

{j,uj)G[n]xR s.t. ||r||o - llr-wajllo > a, (15) 


A. Technical lemmas 

Lemma IV.l (Properties of dissociated signals). Let x G Xk 
be dissociated. Then, 

(i) Xi ^ Xj ^ i,j G supp(x), i ^ j. 

(ii) Ejgt V 0 ^ r C supp(a;). 

Proof: The result follows from 0 . For (i) we set Ti = 
{i} and T 2 = {j}, and for (ii) we let T 2 = 0 . ■ 

Lemma IV.2 (Bounded frequency of values in expander mea¬ 
surements of dissociated signals). Let a; S Xfc be dissociated, 
A G and uj a nonzero value in Ax. Then, there is a 

unique set T C supp(a:) such that w = '■b® value 

w occurs in y at most d times, 

|{i G [to] : Pi = w}| < d V w 7 ^ 0. (16) 

Proof: The uniqueness of the set T C supp(a;) such that 
UJ = follows by the definition of dissociated. Since 

|A/'(j)| = d for all j G [n], we have that. 


\{i G [to] : yi = w}| 


n -^(J) 

JCT 


< lAfOo)! = d 


(17) 


for any jo G T. ■ 

Lemma IV.3 (Pairwise column overlap). Let A G If 

s < 1/4, every pair of columns of A intersect in less than 

(1 — 2£)d rows, that is, for all ji, j 2 G [n] with ji j 2 

Af(/i)n-^ 02 )| < (l- 2 e)d. (18) 


for some a G (l,(i]. By selecting a pair (j,uj) satisfying 
(15 1 , Serial-fo yields a decrease in ||r||o at every update, 
and is guaranteed to converge in C)(n log k) iterations of 
computational complexity 0 (d) if the signal is dissociated. 
Parallel-^o is designed similarly, but adapted to be able to take 
full advantage of modern massively parallel computational 
resources. Indeed, Parallel-fg selects and update all pairs (j, ui) 
satisfying Gl) and updates these values in x in parallel. 
Under this updating scheme, a strict contraction in ||r||o is 
guaranteed at every iteration when the signal is dissociated 
and a = (1 — 2e)d with e < 1/4, though we show in Section 
[V]that one can fix a = 2 and get high phase transitions and 
exceptional speed. 

Section [TV-A| presents the key technical lemmas that explain 
the behaviour of an iteration of Serial-fo and Parallel-^o- In 
particular, technical lemmas are stated to show how often 
values in Ax appear when x G Xk ^ ^ ^'ke'd^ '■b^t 

when a value in Ax appears sufficiently often it must be a 
value from a; at a specified location. This property ensures 
the algorithm updates its approximation x with values Xj in 
the entry, that is with the exact values from x at the 


correct locations. The dissociated signal model. Definition I.l 


is an essential component in the analysis presented in Section 


IV-A though we will observe that the algorithms’ recovery 


region degrade gracefully as the fraction of duplicate entries 
in X increases. The convergence rate of Serial-fg and Parallel- 
fo are presented in Section IV-B and together they establish 
Theorem 11.21 


Proof: Let S C [n\ be such that [S'! = 2 then 

|AA(S')| >2(l-e)d>2d-(l-2e)d, (19) 


where the hrst inequality is Definition jll.lj and the second 
inequality follows from e < 1/4. However, |A/'(S')| can be 
rewritten as 


W{S)\ = |AA(ji)| + |AA(j 2 )| - AA(ji)n-^(72)|, (20) 


for some ji,j 2 G Coupling (|^ with ([T^ gives m 


Lemma IV.4 (Progress). Let y = Ax for dissociated x G Xk 
and A G E™/'^ with e < 1/4. There is a pair (j, w) G [n] x M 
such that 


l{i G Af(j) : yi = uj}l > (1 - 2£)d. (21) 

Proof: Let S = supp(a;), then by the information¬ 
preserving property (0 it holds that |A/i(S')| > (1 — 2£)d|S'|, 
where A//(S') is defined in (|^, or alternatively, by Mi(S) = 
{i G [to] : yi = Xj^j G S} in the context of dissociated 
signals. Given the lower bound in |A/i(S)| > (1 — 2£)d\S\, 
if |S| 7 ^ 0, at least one j G S must have at least (1 — 2£)d 
neighbours in y with identical nonzero entries. Letting ui take 
the value of such repeated nonzeros in y gives the required 
pair (j, uj) G [n] x K. ■ 

Lemma IV.S (Support identihcation). Let y = Ax for disso¬ 
ciated X G Xk ^ S "'bb e < 1/4. Let w 7 ^ 0 be 

such that (|2 T]i and uj = xj. 
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Objective 

Score 

Signal 

Concurrency 

Number of iterations 

Iteration cost 


SMP (3 

h 

median 

any 

parallel 

0(log ||x||i) 

0{nd + n log n) 


SSMP (Bl 

u 

median 

any 

serial 

Oik) 

OiiPnlm -1- n -1- (^ logn) log ||3;||i) 

c3 

_o 

LDDSR dSl 

4 

mode 

any 

serial 

Oidk) 

Oi^+n) 

''171 ’ 

0. 

parallel-LDDSR 

4 

mode 

dissociated 

parallel 

0(\ogk) 

Oind) 


ER (TJ 

4 

mode 

any 

serial 

Oik) 

0(^+n) 

C/J 

C 

D 

serial -^0 

4 

^ 0 -decrease 

dissociated 

serial 

0{n log k) 

Oid) 

C 

O 

U 

parallel-£o 

io 

£o-decrease 

dissociated 

parallel 

C’(logfc) 

Oind) 


TABLE II; Summary of prior art in combinatorial compressed-sensing. 


Proof: Our claim is that for any uj which is a nonzero 
value from y, if the cardinality condition (|2T| is satisfied then 


the value uj = 


£T_ 


occurs for the set T being a singleton, 


|r| = 1. Lemma IV.2 states that T is unique and that 


|{i e A/'(j) : Vi = w}! = 


n -^(j) 


JCT 


( 22 ) 


If \T\ > 1 then the above is not more than the cardinality of 
the intersection of any two of the sets Af{ji) and N{j 2 ), and 
by ( [T8] | in Lemma |IV.3 that is less than (1 — 2£)d which 
contradicts the cardinality condition and consequently 
|r| < 1. However, Lemma IV.4 guarantees that \T\ > 0, so 
jrj = 1 and oj = Xj. ■ 

Equipped with Lemmas |IV. 1 1 - |IV.5| we prove Theorem 
[^considering Serial-fo and Parallel-fg separately, beginning 
with the later. Note that since x G Xk algorithm only 

sets entries in x to the correct values of x, then x — x G Xk’ 
and Lemmas |IV.4 and IV.5 hold with y replaced by r = 
y — A{x — x). 


B. Proof of Theorem 1.2 


Theorem IV.6 (Convergence of Parallel-fo)- Let A G 
and let £ < 1/4, and x S Xfc be dissociated. Then, Parallel-^o 
with a = (1 — 2e)d can recover x from y = Ax G K"* in 
O(logk) iterations of complexity 0(dn). 


Proof: Let x = 0 be our initial approximation to x € x^. 
During the iteration of Parallel-^o. let Si = supp(x — x) 
and include a subscript on the identificat ion s et T = Ti C [n], 

and the required 


IV.5 


As A S ^'ke'd ^ttd e < 1/4, by Lemma 
entry-wise reduction in the residual by at leak a = (1 — 2£)d, 
it follows that Parallel-fo only sets entries in x to the correct 
values of x and as a result ||x — x||o < ||^r||o = k for every 
iteration. Moreover, by Lemma IV.4 the set Ti long 

as X ^ X, so the algorithm eventually converges. 

In fact, we show that the rate of reduction of ||x — x||o per 
iteration is by at least a fixed fraction As A S E™/^^ 

has d nonzeros per column, the reduction in the cardinality of 
the residual, say ||r^||o — ||o. can be at most d\Ti\. That 


is, 

\\r%-\\r^+^\\o<d\Ti\. (23) 

To establish a fractional decrease in |5'£+i| we develop a lower 
bound on Ik^Ho — ||o- For Q C Si define the set A/’f*(Q) 

to be the set of nodes in Ni{Si) and such that i G JV{j) for 
some j G Q, i.e. 

AAf^(g) = {tG JfiiSi) : t G G Q}. (24) 

Consider the partition Si = TiU {Si \ Ti) and rewrite Ni{Si) 
as the disjoint union 

Ml {Si) = AAf ^ {Ti) U AAf ^ \ Ti). (25) 

Note that Mi^{Ti) MiiTi), and that by ( |24l l and the 
dissociated signal model, Mi{Ti) C [m] is the set of indices 
in that are identical to a nonzero in x and that have a 
frequency of at least a = (1 — 2£)d, so 

||r^||o-||r^+'llo>|A^f'(r,)|. (26) 


At iteration f, if Ti = Si, the full support of x is correctly 
identified, so x = x after updating x. Otherwise, Ti Si and 
the set Si\Ti is not identified by the algorithm at this iteration. 
We derive a lower bound on \Afi‘{Ti)\ by considering two 
cases: a G N and a ^ N. 

If a G N, then each node in Si \ Ti has at most a — 1 
duplicates in r^, so 


Mf‘{Si\Ti) <{a-l)\Si\Ti\. 


(27) 


Using the the information-preserving property 0 and the 
identity given in (|24ll it follows that 


\Aff^{Ti)\ + \M,^^{Si\Ti)\ 

> {l-2£)d{\Ti\ + \Si\Ti\) 

= (1 - 2£)d|r,| + \ Ti\ + (a - 1)1^^ \ Ti\. (28) 


Now, using (10 to lower bound ( |28] l, and solving for 
\Mi"{Ti)\ gives 

\Mf^{Ti)\ > (1 - 2£)d|T,| + |5, \ 7>|. 


(29) 
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By coupling ( |29l l, ( |26| l, and ( |23| l into a chain of inequalities it 
is seen that 


(l- 2 £)d|T,| + (|:S,|-|T,|)<d|r,|, 


(30) 


which simplihes to 


\TA> 


1 




(31) 


l + 2ed' 

If a ^ N, then each node in Si \ Ti has at most [aj 
duplicates in so 


J^f‘{Si\Ti) < LaJ|5,\7i|. 


(32) 


Similarly as in the former case, using ( |24| ) and the the 
information-preserving property we obtain 

> il-2e)d{\Ti\ + \Si\Ti\) 

= il-2e)d\Ti\ + {a-[a\)\Si\Ti\ + [a\\Si\Ti\. (33) 

Just as in the previous case, ( [33] l is bounded from below using 
(j3^, and the resulting inequality is used to get 

Wf‘{Ti)\ > {l-2e)d\Ti\ + {a- [aJ)|5, \ (34) 

Inequalities ( [34l i, ( [26] l, and ( |2^ are then used to derive 

a\Ti\ + (a - LaJ)(|5r| - |Tr|) < d|T,|. (35) 

It follows from a = {1 — 2e)d ^ N and the properties of step 
functions that (|15|) is equivalent to 


|r,l > 1 |a|. 


(36) 


1 -f [2£dJ 

Finally, note that ( |3^ reduces to ( [3T] ) when a G N, so using 
Si+i = Si\Ti and ( [36l l, we conclude that 

(37) 




Since |S'o| = k it follows that Parallel-£o will have con¬ 
verged after £* iterations when fc( 2 £d/(l-|- [ 2 edJ))^* < 1 , 
which is achieved for 

Each iteration of Parallel-£o involves computing ( |2T] i for each 
j G H which is equivalent to n instances of hnding the 
mode of a vector of length d which can be solved in 0{d) 
complexity provided a > [(i/ 2 j | 


Theorem IV.7 (Convergence of Serial-^o)- Let A G and 
let e < 1 /4, and a; G be a dissociated signal. Then, Serial- 
£o with a = (1 — 2e)d can recover x from y = Ax G K™ in 
O(nlogfc) iterations with complexity 0{d). 

Proof: The loop over j G [n] for Serial-£o identihes 
singletons T to update values in x in serial. The union of 
the singletons for j G \n] includes the set of all nodes for 
which the residual would be reduced by at least a if one were 
to forgo the serial update in x. For a = (1 — 2e)d, the proof 
of convergence for Theorem IV. 6 | establishes that this results 
in a reduction of the cardinality of supp(a: — x) by at least a 


fraction 2£d/{\ + fledf). That is, for p an integer, Serial-£o 
satishes ^ 

after £ = pn iterations, and converges to i = x after at most 
p* > log(fc)/log((l -I- [2£dJ)/(2£d)) for convergence after 

t >n ^log"^ ( ^ ~^2£d^^ ) ) 

iterations. Each iteration of Serial-fo involves computing the 
mode of a vector of length d and updating d entries in the 
residual. Since we are interested in knowing the mode of rjy(^j') 
only when the most frequent element occurs more than d/2 
times, this value can be found at cost 0{d) ll23l . ■ 


C. Discussion 

1) The computational cost of computing a mode can be 


improved if d is small: Evaluating (21 1 for a given column 
j G supp(x) is equivalent to hnding the mode of This 

can be done at cost 0{d) using the Boyer-Moore Majority 
vote algorithm 12^ . However, this algorithm requires that an 
element of the array occurs more than [(i/ 2 j times, so it might 
fail when we set a G [[(i/2j]. Our numerical experiments 
(Section show that best recovery regions are obtained for 
a — 2, so we prefer to have an algorithm with 0{d) per- 
iteration cost for all a G [d]. 

Our approach is presented in Algorithm Instead of 
looking for an w G K satisfying (21 1 for each j G [n], at the 


Ith 


iteration we consider the reduction caused by Uj, dehned 
as the £ (mod d)-th element in rjyi^jy When using this shifting 
strategy we compromise the hnal number of iterations, but we 
also keep a hxed cost of d complexity per iteration for any 
a G [d]. The convergence guarantees of our algorithms when 
using this shifting strategy are presented in Theorem |IV .8 


Algorithm 8: Computation of score for serial-^o 
parallel-^o- 

Data: j G [n]; r G M™; w G N 
Result: Sj ^ |{* G M{j) : ri= uj}\ 


Theorem IV.8 (Convergence of Shifted Parallel-£o)- Let A G 
"'Lh £ < 1/4, and x G Xfc be dissociated. Then, the 
shifted versions of Serial-^o Parallel-^g with a = (1 — 
2£)d can recover x from y = Ax G M"* in an average of 
O(dn log k) operations. 

Proof: Let x = 0 be the initial approximation to x G Xfc> 
and A G E™/"” with £ < 1/4. At £*^ iteration, let T = Ti 
be the set satisfying ( [ 2 T] i, that is, the one that Parallel-fg 
has marked for update. For j G T, let ujj be the most 
frequent element in rjy-Qy In shifted-parallel-^g^ L not 
directly computed. Instead, at iteration £, the frequency of the 
£ (mod d)-th value in is computed using Algorithmand 
tested against the imposed threshold a. In the worst case, this 
increases the number of iterations by a factor 0{d). However, 
on average, this is not the case, and convergence in O{logk) 
iterations is guaranteed. 
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To see this, let j G T and let w be drawn at random from 
rj\f{j)- Then, Pr(a; = ujj) > l — 2e, so on average at iteration 
£ we will identify [T^Kl — 2e) correct entries in supp(a; — 
x). Given the bound for |T^| ( |36l l in the proof of parallel- 
£o, we have that at each iteration we identify at least (1 — 
Therefore 


l+[2edj 

,^f{l-2e){l-2ed+[2ed\)\,^, 

- 1 ,- rrmi -'' ''■ 


(41) 


2) Our theoretical guarantees immediately apply to 
LDDSR: When e = 1/4, we have that (1 — 2e)d = d/2, 
so we recover a parallel version of LDDSR (Algorithm for 
dissociated signals. We call this algorithm Parallel-LDDSR, 
and we test its performance in Section [V] 

3) Non-dissociated signals can be recovered with a disso¬ 
ciated A: There are many signals models in which the disso¬ 
ciated condition does not hold. For instance, if x is a binary 
signal or has integer-valued nonzeros. In this case, the sensing 
matrix A can be modified to make the nonzero elements of 
X identifiable by our algorithms. In particular, scaling each 
column of the matrix by i.i.d. random numbers coming from 
a continuous distribution introduces enough information in y 
for our algorithms to correctly identify supp(x). 

4) Expander matrices preserve information of dissociated 
signals: We now discuss the concept of dissociated signals 
under an Information Theory viewpoint. To do this, suppose 
that {Xi ,..., Xk) is a vector of k random variables associated 
with {xi,... ,Xk} = supp(x) and that {Xi,... ,Xk) ^ p 
for some distribution p supported on a finite set. Note that 
condition (iii) in Definition implies that, 

Xii “t” * * * -f Xi^ ^ji “f * * * ~i-Xj^ foi i\ ji, . . . , ^ ji' 

(42) 

Now, consider the following Shannon-entropy inequalities. 

Lemma IV.9 (Entropy inequalities). For a random vari¬ 
able X ~ p, let H{-) be its Shannon entropy. Now, let 
Xi, - ■ ■ , Xk be a set of random variables with joint distri¬ 
bution {Xi,... ,Xk) ~ p. Assume that the random variable 
Xi is supported on {(xi)i,..., {xi)i}. Then, 


H{Xi + ...+Xk)< H{Xi, ...,Xk)< H{Xi) -£■■■ H{Hk) 

(43) 

With equality on the left if and only if {xi)i^ -f • • • -f {xk)ik 7 ^ 
(xi)jj -f • • • -f {xk)jk for ii 7 ^ ji^ and equality on the right if 
and only if Xi _L Xj for i f j. 


Proof: See ll^ and ||25]| for a proof. ■ 

In the case of discretely supported distributions, a dissoci¬ 
ated signal can be understood as one in which the entries on 
supp(x) are drawn according to a distribution p fulfilling. 


(i) PT[Xi = u}\Xj=u:]=0Wifj'iuj^0. 

(ii) = 0] = 0VTC [A] 

(iii) = 0 V ri,T 2 C [fc] with 

Ti 74 T2. 

Property (iii) above, together with Lemma ( IV.9| l say that 
probability distribution on the support of dissociated signals 


imply 

^ • ■ • ’ ^ ^ C [*] (44) 

And since the value of each entry in y = Ax is distributed 
according to some T C [fc], we get that 

when computing y with a A € having e < 1/4 

and a dissociated signal x, ( |44| ) will hold. This implies that 
linear transformations with expander matrices preserve the 
information in x. 

V. Numerical Experiments 

In this section we perform a series of numerical experi¬ 
ments to compare Parallel-fo and Serial-fo with state-of-the-art 
compressed sensing algorithms. These comparisons are done 
by adding Parallel-^o and Serial-^o to the GAGA software 
package ll26l which includes CUDA-C implementations of a 
number of compressed sensing algorithms as well as a testing 
environment to rapidly generate synthetic problem instances. 
This approach allows us to solve hundreds of thousands of 
randomly generated problems and to solve problems with n 
in the millions. 

Unless otherwise stated, all tests were performed with the 
nonzeros of x drawn from a standard normal distribution 
1 ) and the parameter a in Serial-fo and Parallel-£o was 
set to 2. 

Figures |2][^ were computed using a Linux machine with 
Intel Xeon E5-2643 CPUs @ 3.30 GHz, NVIDIA Tesla KIO 
GPUs, and executed from Matlab R2015a. Figures [9pT| were 
computed using a Linux machine with Intel Xeon E5-2667 v2 
CPUs @ 3.30GHz, NVIDIA Tesla K40 GPUs, and executed 
from Matlab R2015a. 


A. Substantially higher phase transitions 

The phase transition of a compressed-sensing algorithm EtII 
is the largest value k/m, which we denote p*{m/n) noting 
its dependence on m/n, for which the algorithm is typically 
(say greater than half of the instances) able recovery all k 
sparse vectors with k < mp*{m/n). The value p*{m/n) often 
converges to a fixed value as n is increased with m/n being 
a fixed fraction. Figure shows the phase transition curve for 
each of the CCS algorithms stated in Section III as well as 
Parallel-fo and Serial-£o- To facilitate comparison with non- 
CCS algorithms. Figure also includes the theoretical phase 
transition curve for -regularization for A drawn Gaussian 
ESl I 29 I . which is observed to be consistent with £ 1 - 
regularization for A G The curves were computed by 

setting n = 2^®, d = 7, and a tolerance of 10“®. The testing 


is done at m = SpU for 


Sp e {0.02p : p e [4]} U 0.1 


89 

iM) 


(p-l):p€[20] 


For each Sp, we set p = 0.01 and generate 10 synthetic 
problems to be applied to the algorithms, with x having in¬ 
dependent and identically distributed normal Gaussian entries. 
With this restrictions, our signals are dissociated. If at least 
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one such problem was recovered successfully, we increase p 
by 0.01 and repeat the experiment. The recovery data is then 
fitted using a logistic function in the spirit of El and the 
50% recovery transition of the logistic function is computed 
and shown in Figure 

Note the low phase-transition curve of SMP and the substan¬ 
tially higher phase-transition curve of Parallel-fg and Serial-fg. 
As mentioned previously, the multiple updating mechanism of 
SMP gives it sublinear convergence guarantees, but greatly 
compromises its region of recovery. We emphasise that the 
phase transition curves for Serial-fg and Parallel-^o are higher 
than those for SMP, SSMP, ER, and parallel-LDDSR. In 
particular, they are even higher than £i-regularisation for 
(5^ 0.4. 


50% phase transition curve for d = 7 with n = 2^® 



Fig. 2: 50% recovery probability logistic regression curves for 
Es^r,k and n = 218 . The curve for £i-regularisation is the 
theoretical curve for dense Gaussian ensembles, and is shown 
for reference. 

B. Fastest compressed sensing algorithm 

When the signal is dissociated, Parallel-^g is generally the 
fastest algorithm for matrices A G We show this 

numerically by computing the phase transitions of 

Serial-fo, Parallel-fo, parallel-LDDSR, ALPS, CGIHT, CSMPSP, 
ER, FIHT, HTP, NIHT, SMP, SSMP; 

and comparing their average time to convergence at each point 
of (S,p). The phase transitions are computed similarly to 
those in Figure with problem parameters of n = 21® and 
d = 7. In particular, Parallel-£o is also used with a = 2 . 
The results are shown in Figures and Specifically, Figure 

shows the time in milliseconds that the fastest algorithm 
takes to converge when the problem parameters are located at 
(S, p). The fastest algorithm is in turn identified in Figure 
where we can see that Parallel-fg is consistently the fastest 
algorithm within its phase transition, except for p ^ 1 
where parallel-LDDSR takes less time. However, we note 
that the convergence guarantees of parallel-LDDSR come as a 
byproduct of our analysis the domain in which it is faster than 
Parallel-fg is the region of least importance for applications 
as it indicates more than three fold more measurements were 
taken than would have been necessary if Parallel-^o were used. 


Time (ms) of fasfest algorithm for d = 7 with n = 2’® 



5=m/n 

Fig. 3: Average recovery time (ms) of the fastest algorithm at 
each (S,p) for Efc.ep and n = 2 ^®. 


Algorithm selection map for d = 7 with n = 2’® 



Fig. 4; Selection map of the fastest algorithm at each (<5, p) 
for Ek,e ,7 and n = 2 ^®. 

C. Parallelisation brings important speedups: examples with 
m n 

As shown in Algorithm the speed of Algorithms |3]|^ can 
be improved if the scores Sj and updates Uj are computed in 
parallel for each j G [n]. However, implementing this paralleli¬ 
sation is not enough to cut down an algorithm’s complexity to 
that of the state-of-the-art’s. Figures |5]|^ show the average time 
to exact convergence for each of the combinatorial compressed 
sensing algorithms. It can be seen in addition to Serial-fg 
and Parallel-£o having higher phase transition than ER and 
SSMP, they are also substantially faster to converge to the 
true solution for n = 2 ^® and either S = 0.01 or 5 = 0 . 1 . 
It is interesting to note that for this problem size Serial-fg is 
substantially faster than ER and SSMP, even when the two 
latter are implemented in parallel and run on a modern high 
performance computing GPU. 

D. Convergence in O{logk) iterations 

The theoretical guarantees of Serial-^g Parallel-fg state 
that convergence can be achieved in 0{nd\ogk) operations. 
The number of operations per iteration can be verified simply 
by counting operations in the algorithm, which is 0{d) for 
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Mean time to exact convergence 
m/n = 0.01 with n = and d = 7 



Fig. 5; Average recovery time (sec) with dependence on p for 
(5 = 0.01 and ^ 7 with n = 


Mean time to exact convergence 
m/n = 0.10 with n = and d = 7 



Fig. 6 : Average recovery time (sec) with dependence on p for 
(5 = 0.1 and E^. g 7 with n = 


Iterations for convergence 
m/n = 0.10 with n = 2^° and d = 7 



Fig. 7: Number of iterations to convergence for Parallel-£o> 
Serial-^o^ and parallel-LDDSR at i5 = 0.1 with E^ ^ 7 and n = 
2 ^°. The number of iterations of Serial-^o has been normalised 
by n to showcase its O{n\ogk) guarantee in the number of 
iterations. 


Iterations for convergence 
m/n = 0.10 with n = 2 ^^ and d = 7 



Fig. 8 ; Number of iterations to convergence for ER and SSMP 
at 5 = 0.1 with Efc_g ^7 and n = 2 ^°. 


Serial-^o and 0{nd) for Parallel-^o and recording the number 
of iterations. Figure shows that the number of iterations to 
convergence for Serial-fp, Parallel-fo, and parallel-LDDSR. 
The tests were performed by hxing n = 2^°, S = 0.1, and 
d — 7, and considering signals with sparsity ranging from 
p = 0.05 to p = 0.1. It can be seen in Figure[7]that the number 
of iterations to convergence is bounded by the curve f{k) = 
log k, thus verifying our claims. We also make clear that by 
Dehnition III.l Serial-£o is shown to converge in 0{n\ogk) 


iterations, but for the sake of this experiment, we normalise 
the hnal number of iterations for Serial-fg by a factor of n. 
Note the lower number of iteration by Serial-fg due to its serial 
implementation with residual updates revealing more entries 
that satisfy the reduction of the residual by a. Now, to give a 
point of comparison, we also compute the number of iterations 
for ER and SSMP, which take 0{k) iterations to converge. The 
results are shown in Figure where the same parameters as 
in Figure |7]have been used. In particular, we can see that for a 
problem with k/m = 0 . 8 , Parallel-fg takes 5 iterations, while 
ER and SSMP take about 8000 iterations to solve the same 
problem. 


E. Increasing phase transition as 5 ^ D and n —>■ oo 


It is shown in Eigure that Serial-^g and Parallel-^g have a 
very high phase transition of just over 0.3 even for very small 
values of 5. We hypothesise that this high phase transition 
persists for any hxed 5 G (0,1) provided n is sufficiently 
large. We provide numerical support of this claim in Eigure 
1^ where for hxed S — 10“^ and d = 7, we have plotted 
the average time to convergence for Parallel-fg as p increases. 
The experiment was repeated for each n G {2^^, 2^^, 2^®}, by 
initialising p — 0.01 and generating 30 problems at each p. If 
at least 50% of the problems converge we average out the time 
to convergence for successful cases, and perform the update 
p ^ p + 0.01; otherwise, we stop. Our results in Figure 
show that for 5 = 10“^, the phase transition of the algorithm 
increases with n to just over 0.3. 


Finally, in Table III we show the average timing depicted in 
Figure]^ for p = 0.05 which shows the approximate increase 
in the average computation time being proportional to n. 
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Mean time to exact convergence for parallel-10 
m/n = 10 and d = 7 



Fig. 9; Average recovery time (sec) for Parallel-fo, with depen¬ 
dence on p for i5 = 0.001 and with n € { 2 ^^, 2^"^, 2 ^®}. 


n 

time tn 

ratio Unl^n 

222 

0.0167 

3.338 

224 

0.0557 

4.163 

226 

0.2319 

- 


TABLE III: Average recovery time (sec) for Parallel-^g at p = 
0.05 and <5 = for n G {222,224,226}. 


F. Almost dissociated signals 

The analysis of Parallel-fo and Serial-fo relied on the model 
of dissociated signals Q. We explore the effect on recovery 
ability of Parallel-fo and Serial-£o as the signal model is no 
longer dissociated, with a fixed fraction of the values in x 
being equal. To do this, we consider signals x G Xk 
nonzero values composed of two bands: one in which all 
entries are equal to a fixed value drawn at random from 
a standard normal distribution A/^( 0 , 1 ), and another one in 
which each entry is drawn independently of each other from 
A/^(0, 1). Our results are shown in Figure [To| where we can see 
that as the fraction of values which are equal increases (shown 
in the hgure by the parameter band), the phase transitions 
gracefully decrease from the flat shape observed for perfectly 
dissociated signals to an increasing log-shaped curve when 
band — 0.9. Note that the overall phase transition decreases, 
with the greatest decrease for (5^1. 

G. d should be small, but not too small 

Selection of the number of nonzeros per column, d, has 
not been adressed. In our numerical experiments we have 
consistently chosen d = 7 as the left-degree of our expander. 
Our choice of d = 7 for our problem size’s order of magnitude 
is justihed by Figure [TT] where we have computed the phase 
transitions for Parallel-fo for all odd values of d between 5 
and 19. For d = 5, the phase transition of the algorithm is very 
low, thus signalling expanders of bad quality. For d = 7 the 
phase transition is substantially greater than when d = 5, and 
gradually decreases for values of d greater than seven. Note 


50% phase transition curves for paralleIJO with n = 2’® 



Fig. 10: 50% recovery probability logistic regression curves 
for Parallel-fo with £^,£,7 and n = 2 ^ 6 , with signals having 
a fixed proportion, band, of identical nonzero elements in its 
support. 

that the expander condition implies (1 — e)d/c < m which 
encourages small values of d in order that m/k can be as 
large as possible. 


50% phase transition curves for paralleIJO with n = 2^® 



Fig. 11: 50% recovery probability logistic regression curves 
for Parallel-fo with E^ ^ ^ and n = 2^® for d G 

{5,7,9,11,13,15,17,19}.’ ’ 


VI. Conclusions and Future Work 

We have proposed two algorithms for combinatorial com¬ 
pressed sensing with provable convergence guarantees in 
0{dn\ogk) operations and very high phase transitions when 
the signal x is dissociated. In particular, Parallel-£o is observed 
to be empirically the fastest algorithm in compressed sensing 
when the signal is dissociated. We have used the dissociated 
signal model in the convergence proofs, but that in practice 
one can relax this assumption and still get reasonably high 
phase transitions. 

As future work it remains to address the case of noisy 
observations, and to extend the scope of the algorithms to 
more general signal models. The proofs presented in this 
paper should extend trivially to noise which is bounded to be 
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less than half the minimal distance between obtainable values 
Xi by introducing an equivalence class. A variant which 
is robust to Gaussian noise is scope for future work. 

Appendix 

For completeness, we give a proof of Lemma |II.4| 

Proof: For any unbalanced, left d-regular, bipartite graph 
it holds that; 


|A/'i(^)| + |A/'>i(5)| = |AA(5)|, (45) 

|A(i(^)|+2|A(>i(5)| <d|5|. (46) 


Where (45 i follows from the dehnition of A/> i (5”), and (46 1 by 
double-counting the edges emanating from S to N{S). Now, 
to prove that (j^ is necessary, assume that G is a {k,£,d)- 
expander graph. Then, for S G we have that 


|AA(:5)| > (l-e)d|^|. (47) 


Combining ( |45] l, ( |46l l and ( |47| ) we get the chain of inequalities 
d|5| - |A(>i(5)| > \M{S)\ > (1 - £)d\S\, (48) 

which yield 

|AA>i(^)| <£d|5|. (49) 

Plugging (49 1 into ( [45] l and using ( |47| ) we obtain 

|AAi(S')| > (l-2e)(i|S'|. (50) 


To prove the sufficiency of 0 for graph expansion, we couple 
it with ( |46] l into the system 

(1 - 2£)d|5| < \Mi{S)\ < d|5| - 2|A(>i(5)|, (51) 


and use the left and right hand sides recover ( [49| . Now, using 
0 and ( |45] l we obtain 

|AA(5)|-|A/'>i(:5)| > (l-2e)d|5|. (52) 


And using ( [49] l in ( |52| ) allows us to recover ( [47] i, implying 
that G is a (fc, e, d)-expander graph. ■ 
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